Hash Collisions: Why Do Hash Tables Collide? How to Resolve Them?

Hash tables map keys to array positions using hash functions, but when different keys map to the same position, a hash collision occurs. The core reasons are either the number of keys far exceeding the array capacity or an uneven hash function. The key to resolving collisions is to ensure conflicting keys "occupy distinct positions." Common methods include: 1. **Chaining (Zipper Method)**: The most widely used approach, where each array position is a linked list. Conflicting keys are appended sequentially to the corresponding linked list (e.g., keys 5, 1, and 9 colliding would form a list: 5→1→9). This method is simple to implement, has high space utilization, and allows efficient traversal during lookups. 2. **Open Addressing**: When a collision occurs, vacant positions are sought in subsequent slots. This includes linear probing (step size 1), quadratic probing (step size as a square), and double hashing (multiple hash functions). However, it may cause clustering and is more complex to implement. 3. **Public Overflow Area**: The main array stores non-colliding keys, while colliding keys are placed in an overflow area. Lookups require traversing both the main array and the overflow area, but space allocation is difficult. The choice of collision resolution method depends on the scenario. Chaining is widely adopted due to its efficiency and versatility. Understanding hash collisions and their solutions is crucial for optimizing hash table performance.

Read More
Hash Functions: How Do Hash Functions Generate Hash Values? A Must-Know for Beginners

A hash function is a "translator" that converts input of arbitrary length into a fixed-length hash value, which serves as the "ID number" of the data. Its core characteristics include: fixed length (e.g., MD5 produces 32 hexadecimal characters), one-way irreversibility (original data cannot be derived from the hash value), near-uniqueness (extremely low collision probability), and the avalanche effect (minor input changes lead to drastic hash value changes). The generation process consists of three steps: input preprocessing into binary, segmented mathematical operations, and merging the results. Unlike encryption functions, hash functions are one-way and do not require a key, while encryption is reversible and requires a key. They have extensive applications: file verification (comparing hash values to prevent tampering), password storage (storing hash values for security), data indexing, and data distribution in distributed systems. As a data fingerprint, the key characteristics of hash functions make them indispensable in security and verification.

Read More
How Does a Hash Table Store Data? Hash Functions Simplify Lookup

The article uses the analogy of searching for a book to introduce the problem: sequential search of data (such as arrays) is inefficient, while a hash table is an efficient storage tool. The core of a hash table is the hash function, which maps data to "buckets" (array positions) to enable fast access and storage. The hash function converts data into a hash value (bucket number), for example, taking the last two digits of a student ID to get the hash value. When storing, the hash value is first calculated to locate the bucket. If multiple data items have the same hash value (collision), it can be resolved using the linked list method (chaining within the bucket) or open addressing (finding the next empty bucket). When searching, the hash value is directly calculated to locate the bucket, and then the search is performed within the bucket, eliminating the need to traverse all data, resulting in extremely fast speed. Hash tables are widely used (such as in address books and caches), with their core being the use of a hash function to transform the search from "scanning through everything" to "direct access".

Read More